Search | WHO COVID-19 Research Database

1.

Validation of Real-World Case Definitions for COVID-19 Diagnosis and Severe COVID-19 Illness Among Patients Infected with SARS-CoV-2: Translation of Clinical Trial Definitions to Real-World Settings (preprint)

Mei Sheng Duh; Catherine Nguyen; Heather Rubino; Christopher Herrick; Rose Chang; Maral DerSarkissian; Yichuan Grace Hsieh; Azeem Banatwala; Louise H. Yu; Gregory Belsky; Marykate E. Murphy; Janet Boyle-Kelly; Andrew Cagan; Bruce E. Stangle; Pierre Y. Cremieux; Francesca Kolitsopoulos; Shawn N. Murphy.

medrxiv; 2023.

Preprint in English | medRxiv | ID: ppzbmed-10.1101.2023.09.12.23295441

ABSTRACT

Purpose: This study assessed the performance of International Classification of Diseases 10th Revision, Clinical Modification (ICD-10-CM) coronavirus disease 2019 (COVID-19) diagnostic code U07.1 against polymerase chain reaction (PCR) test results (Objective 1), and electronic medical record (EMR)-based codified algorithm for severe COVID-19 illness based on endpoints used in the Pfizer-BioNTech COVID-19 vaccine trial against chart review (Objective 2). Methods: This retrospective, longitudinal cohort study used EMR data from the Mass General Brigham COVID-19 Data Mart (3/1/2020-11/19/2020) for adult patients with [≥]1 PCR test, antigen test, or code U07.1 (Objective 1) and adult patients with a positive PCR test hospitalized with COVID-19 (Objective 2). Results: Among 354,124 patients in Objective 1, 96% had [≥]1 PCR test (including 6% with [≥]1 positive PCR test; 11% with [≥]1 code U07.1). Code U07.1 had low sensitivity (54%) and positive predictive value (PPV; 63%) but high specificity (97%) against the PCR test. Among 300 patients hospitalized for COVID-19 randomly sampled for chart review in Objective 2, the EMR-based case definition for severe COVID-19 illness had high PPV (>95%), showing better performance than severe/critical COVID-19 endpoints defined by the World Health Organization (PPV: 79%). Conclusions: COVID-19 diagnosis based on ICD-10-CM code U07.1 had inadequate sensitivity and requires confirmation by PCR testing. The EMR-based case definition showed high PPV and can be used to identify cases of severe COVID-19 illness in real-world datasets. These findings highlight the importance of validating outcomes in real-world data, and can guide researchers analyzing COVID-19 data when PCR tests are not readily available.

Subject(s)

COVID-19

2.

tSPM+; a high-performance algorithm for mining transitive sequential patterns from clinical data (preprint)

Jonas Hügel; Ulrich Sax; Shawn N. Murphy; Hossein Estiri.

arxiv; 2023.

Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2309.05671v1

ABSTRACT

The increasing availability of large clinical datasets collected from patients can enable new avenues for computational characterization of complex diseases using different analytic algorithms. One of the promising new methods for extracting knowledge from large clinical datasets involves temporal pattern mining integrated with machine learning workflows. However, mining these temporal patterns is a computational intensive task and has memory repercussions. Current algorithms, such as the temporal sequence pattern mining (tSPM) algorithm, are already providing promising outcomes, but still leave room for optimization. In this paper, we present the tSPM+ algorithm, a high-performance implementation of the tSPM algorithm, which adds a new dimension by adding the duration to the temporal patterns. We show that the tSPM+ algorithm provides a speed up to factor 980 and a up to 48 fold improvement in memory consumption. Moreover, we present a docker container with an R-package, We also provide vignettes for an easy integration into already existing machine learning workflows and use the mined temporal sequences to identify Post COVID-19 patients and their symptoms according to the WHO definition.

Subject(s)

COVID-19

3.

Distinguishing Admissions Specifically for COVID-19 from Incidental SARS-CoV-2 Admissions: A National EHR Research Consortium Study (preprint)

Jeffrey G Klann; Zachary H Strasser; Meghan R Hutch; Chris J Kennedy; Jayson S Marwaha; Michele Morris; Malarkodi Jebathilagam Samayamuthu; Ashley C Pfaff; Hossein Estiri; Andrew M South; Griffin M Weber; William Yuan; Paul Avillach; Kavishwar B Wagholikar; Yuan Luo; - The Consortium for Clinical Characterization of COVID-19 by EHR (4CE); Gilbert S. Omenn; Shyam Visweswaran; John H Holmes; Zongqi Xia; Gabriel A Brat; Shawn N Murphy.

medrxiv; 2022.

Preprint in English | medRxiv | ID: ppzbmed-10.1101.2022.02.10.22270728

ABSTRACT

Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. EHR-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. From a retrospective EHR-based cohort in four US healthcare systems, a random sample of 1,123 SARS-CoV-2 PCR-positive patients hospitalized between 3/2020-8/2021 was manually chart-reviewed and classified as admitted-with-COVID-19 (incidental) vs. specifically admitted for COVID-19 (for-COVID-19). EHR-based phenotyped feature sets filtered out incidental admissions, which occurred in 26%. The top site-specific feature sets had 79-99% specificity with 62-75% sensitivity, while the best performing across-site feature set had 71-94% specificity with 69-81% sensitivity. A large proportion of SARS-CoV-2 PCR-positive admissions were incidental. Straightforward EHR-based phenotypes differentiated admissions, which is important to assure accurate public health reporting and research.

Subject(s)

COVID-19 , Severe Acute Respiratory Syndrome

4.

SurvMaximin: Robust Federated Approach to Transporting Survival Risk Prediction Models (preprint)

Xuan Wang; Harrison G Zhang; Xin Xiong; Chuan Hong; Griffin M Weber; Gabriel A Brat; Clara-Lea Bonzel; Yuan Luo; Rui Duan; Nathan P Palmer; Meghan R Hutch; Alba Gutiérrez-Sacristán; Riccardo Bellazzi; Luca Chiovato; Kelly Cho; Arianna Dagliati; Hossein Estiri; Noelia García-Barrio; Romain Griffier; David A Hanauer; Yuk-Lam Ho; John H Holmes; Mark S Keller; Jeffrey G Klann; Sehi L'Yi; Sara Lozano-Zahonero; Sarah E Maidlow; Adeline Makoudjou; Alberto Malovini; Bertrand Moal; Jason H Moore; Michele Morris; Danielle L Mowery; Shawn N Murphy; Antoine Neuraz; Kee Yuan Ngiam; Gilbert S Omenn; Lav P Patel; Miguel Pedrera-Jiménez; Andrea Prunotto; Malarkodi Jebathilagam Samayamuthu; Fernando J Sanz Vidorreta; Emily R Schriver; Petra Schubert; Pablo Serrano-Balazote; Andrew M South; Amelia LM Tan; Byorn W.L. Tan; Valentina Tibollo; Patric Tippmann; Shyam Visweswaran; Zongqi Xia; William Yuan; Daniela Zöller; Isaac S Kohane; - The Consortium for Clinical Characterization of COVID-19 by EHR (4CE); Paul Avillach; Zijian Guo; Tianxi Cai.

medrxiv; 2022.

Preprint in English | medRxiv | ID: ppzbmed-10.1101.2022.02.03.22270410

ABSTRACT

ObjectiveFor multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. Materials and MethodsFor each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or can be a single center, corresponding to transfer learning. ResultsSimulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations. ConclusionsThe SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.

Subject(s)

Leishmaniasis, Cutaneous

5.

International Comparisons of Harmonized Laboratory Value Trajectories to Predict Severe COVID-19: Leveraging the 4CE Collaborative Across 342 Hospitals and 6 Countries: A Retrospective Cohort Study (preprint)

Griffin M Weber; Chuan Hong; Nathan P Palmer; Paul Avillach; Shawn N Murphy; Alba Gutiérrez-Sacristán; Zongqi Xia; Arnaud Serret-Larmande; Antoine Neuraz; Gilbert S. Omenn; Shyam Visweswaran; Jeffrey G Klann; Andrew M South; Ne Hooi Will Loh; Mario Cannataro; Brett K Beaulieu-Jones; Riccardo Bellazzi; Giuseppe Agapito; Mario Alessiani; Bruce J Aronow; Douglas S Bell; Antonio Bellasi; Vincent Benoit; Michele Beraghi; Martin Boeker; John Booth; Silvano Bosari; Florence T Bourgeois; Nicholas W Brown; Luca Chiovato; Lorenzo Chiudinelli; Arianna Dagliati; Batsal Devkota; Robert W Follett; Thomas Ganslandt; Noelia García Barrio; Tobias Gradinger; Romain Griffier; David A Hanauer; John H Holmes; Petar Horki; Kenneth M Huling; Richard W Issitt; Vianney Jouhet; Mark S Keller; Detlef Kraska; Molei Liu; Yuan Luo; Alberto Malovini; Kenneth D Mandl; Chengsheng Mao; Anupama Maram; Thomas Maulhardt; Bucalo Mauro; Marianna Milano; Jason H Moore; Jeffrey S Morris; Michele Morris; Danielle L Mowery; Thomas P Naughton; Kee Yuan Ngiam; James B Norman; Lav P Patel; Miguel Pedrera Jimenez; Emily R Schriver; Luigia Scudeller; Neil J Sebire; Pablo Serrano Balazote; Anastasia Spiridou; Amelia LM Tan; Byorn W.L. Tan; Valentina Tibollo; Carlo Torti; Enrico M Trecarichi; Maria Trecarichi; Michele Vitacca; Alberto Zambelli; Chiara Zucco; - Consortium for Clinical Characterization of COVID-19 by EHR; Isaac S Kohane; Tianxi Cai; Gabriel A Brat.

medrxiv; 2020.

Preprint in English | medRxiv | ID: ppzbmed-10.1101.2020.12.16.20247684

ABSTRACT

Objectives: To perform an international comparison of the trajectory of laboratory values among hospitalized patients with COVID-19 who develop severe disease and identify optimal timing of laboratory value collection to predict severity across hospitals and regions. Design: Retrospective cohort study. Setting: The Consortium for Clinical Characterization of COVID-19 by EHR (4CE), an international multi-site data-sharing collaborative of 342 hospitals in the US and in Europe. Participants: Patients hospitalized with COVID-19, admitted before or after PCR-confirmed result for SARS-CoV-2. Primary and secondary outcome measures: Patients were categorized as ''ever-severe'' or ''never-severe'' using the validated 4CE severity criteria. Eighteen laboratory tests associated with poor COVID-19-related outcomes were evaluated for predictive accuracy by area under the curve (AUC), compared between the severity categories. Subgroup analysis was performed to validate a subset of laboratory values as predictive of severity against a published algorithm. A subset of laboratory values (CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin) was compared between North American and European sites for severity prediction. Results: Of 36,447 patients with COVID-19, 19,953 (43.7%) were categorized as ever-severe. Most patients (78.7%) were 50 years of age or older and male (60.5%). Longitudinal trajectories of CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin showed association with disease severity. Significant differences of laboratory values at admission were found between the two groups. With the exception of D-dimer, predictive discrimination of laboratory values did not improve after admission. Sub-group analysis using age, D-dimer, CRP, and lymphocyte count as predictive of severity at admission showed similar discrimination to a published algorithm (AUC=0.88 and 0.91, respectively). Both models deteriorated in predictive accuracy as the disease progressed. On average, no difference in severity prediction was found between North American and European sites. Conclusions: Laboratory test values at admission can be used to predict severity in patients with COVID-19. There is a need for prediction models that will perform well over the course of the disease in hospitalized patients.

Subject(s)

COVID-19

6.

Validation of a Derived International Patient Severity Phenotype to Support COVID-19 Analytics from Electronic Health Record Data (preprint)

Jeffrey G. Klann; Griffin M Weber; Hossein Estiri; Bertrand Moal; Paul Avillach; Chuan Hong; Victor M Castro; Thomas Maulhardt; Amelia LM Tan; Alon Geva; Brett K Beaulieu-Jones; Alberto Malovini; Andrew M South; Shyam Visweswaran; Gilbert S Omenn; Kee Yuan Ngiam; Kenneth D Mandl; Martin Boeker; Karen L Olson; Danielle L Mowery; Michele Morris; Robert W Follett; David A Hanauer; Riccardo Bellazzi; Jason H Moore; Ne Hooi Will Loh; Douglas S Bell; Kavishwar Wagholikar; Luca Chiovato; Valentina Tibollo; Siegbert Rieg; Anthony LLJ Li; Vianney Jouhet; Emilly Schriver; Malarkodi J Samayamuthu; Zongqi Xia; - The Consortium for Clinical Characterization of COVID-19 by EHR (4CE); Isaac S Kohane; Gabriel A Brat; Shawn N Murphy.

medrxiv; 2020.

Preprint in English | medRxiv | ID: ppzbmed-10.1101.2020.10.13.20201855

ABSTRACT

Introduction. The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) includes hundreds of hospitals internationally using a federated computational approach to COVID-19 research using the EHR. Objective. We sought to develop and validate a standard definition of COVID-19 severity from readily accessible EHR data across the Consortium. Methods. We developed an EHR-based severity algorithm and validated it on patient hospitalization data from 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also used a machine learning approach to compare selected predictors of severity to the 4CE algorithm at one site. Results. The 4CE severity algorithm performed with pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of single code categories for acuity were unacceptably inaccurate - varying by up to 0.65 across sites. A multivariate machine learning approach identified codes resulting in mean AUC 0.956 (95% CI: 0.952, 0.959) compared to 0.903 (95% CI: 0.886, 0.921) using expert-derived codes. Billing codes were poor proxies of ICU admission, with 49% precision and recall compared against chart review at one partner institution. Discussion. We developed a proxy measure of severity that proved resilient to coding variability internationally by using a set of 6 code classes. In contrast, machine-learning approaches may tend to overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold standard outcomes, possibly due to pandemic conditions. Conclusion. We developed an EHR-based algorithm for COVID-19 severity and validated it at 12 international sites.

Subject(s)

COVID-19

7.

Individualized Prediction of COVID-19 Adverse outcomes with MLHO (preprint)

Hossein Estiri; Zachary H. Strasser; Shawn N. Murphy.

arxiv; 2020.

Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2008.03869v2

ABSTRACT

We developed MLHO (pronounced as melo), an end-to-end Machine Learning framework that leverages iterative feature and algorithm selection to predict Health Outcomes. MLHO implements iterative sequential representation mining, and feature and model selection, for predicting the patient-level risk of hospitalization, ICU admission, need for mechanical ventilation, and death. It bases this prediction on data from patients' past medical records (before their COVID-19 infection). MLHO's architecture enables a parallel and outcome-oriented model calibration, in which different statistical learning algorithms and vectors of features are simultaneously tested to improve the prediction of health outcomes. Using clinical and demographic data from a large cohort of over 13,000 COVID-19-positive patients, we modeled the four adverse outcomes utilizing about 600 features representing patients' pre-COVID health records and demographics. The mean AUC ROC for mortality prediction was 0.91, while the prediction performance ranged between 0.80 and 0.81 for the ICU, hospitalization, and ventilation. We broadly describe the clusters of features that were utilized in modeling and their relative influence for predicting each outcome. Our results demonstrated that while demographic variables (namely age) are important predictors of adverse outcomes after a COVID-19 infection, the incorporation of the past clinical records are vital for a reliable prediction model. As the COVID-19 pandemic unfolds around the world, adaptable and interpretable machine learning frameworks (like MLHO) are crucial to improve our readiness for confronting the potential future waves of COVID-19, as well as other novel infectious diseases that may emerge.

Subject(s)

COVID-19 , Learning Disabilities , Death

8.

International Electronic Health Record-Derived COVID-19 Clinical Course Profile: The 4CE Consortium (preprint)

Gabriel A Brat; Griffin M Weber; Nils Gehlenborg; Paul Avillach; Nathan P Palmer; Luca Chiovato; James Cimino; Lemuel R Waitman; Gilbert S Omenn; Alberto Malovini; Jason H Moore; Brett K Beaulieu-Jones; Valentina Tibollo; Shawn N Murphy; Sehi L'Yi; Mark S Keller; Riccardo Bellazzi; David A Hanauer; Arnaud Serret-Larmande; Alba Gutierrez-Sacristan; John H Holmes; Douglas S Bell; Kenneth D Mandl; Robert W Follett; Jeffrey G Klann; Douglas A Murad; Luigia Scudeller; Mauro Bucalo; Katie Kirchoff; Jean Craig; Jihad Obeid; Vianney Jouhet; Romain Griffier; Sebastien Cossin; Bertrand Moal; Lav P Patel; Antonio Bellasi; Hans U Prokosch; Detlef Kraska; Piotr Sliz; Amelia LM Tan; Kee Yuan Ngiam; Alberto Zambelli; Danielle L Mowery; Emily Schiver; Batsal Devkota; Robert L Bradford; Mohamad Daniar; - APHP/Universities/INSERM COVID-19 research collaboration; Christel Daniel; Vincent Benoit; Romain Bey; Nicolas Paris; Anne Sophie Jannot; Patricia Serre; Nina Orlova; Julien Dubiel; Martin Hilka; Anne Sophie Jannot; Stephane Breant; Judith Leblanc; Nicolas Griffon; Anita Burgun; Melodie Bernaux; Arnaud Sandrin; Elisa Salamanca; Thomas Ganslandt; Tobias Gradinger; Julien Champ; Martin Boeker; Patricia Martel; Alexandre Gramfort; Olivier Grisel; Damien Leprovost; Thomas Moreau; Gael Varoquaux; Jill-Jenn Vie; Demian Wassermann; Arthur Mensch; Charlotte Caucheteux; Christian Haverkamp; Guillaume Lemaitre; Ian D Krantz; Sylvie Cormont; Andrew South; - The Consortium for Clinical Characterization of COVID-19 by EHR (4CE); Tianxi Cai; Isaac S Kohane.

medrxiv; 2020.

Preprint in English | medRxiv | ID: ppzbmed-10.1101.2020.04.13.20059691

ABSTRACT

We leveraged the largely untapped resource of electronic health record data to address critical clinical and epidemiological questions about Coronavirus Disease 2019 (COVID-19). To do this, we formed an international consortium (4CE) of 96 hospitals across 5 countries (www.covidclinical.net). Contributors utilized the Informatics for Integrating Biology and the Bedside (i2b2) or Observational Medical Outcomes Partnership (OMOP) platforms to map to a common data model. The group focused on comorbidities and temporal changes in key laboratory test values. Harmonized data were analyzed locally and converted to a shared aggregate form for rapid analysis and visualization of regional differences and global commonalities. Data covered 27,584 COVID-19 cases with 187,802 laboratory tests. Case counts and laboratory trajectories were concordant with existing literature. Laboratory tests at the time of diagnosis showed hospital-level differences equivalent to country-level variation across the consortium partners. Despite the limitations of decentralized data generation, we established a framework to capture the trajectory of COVID-19 disease in patients and their response to interventions.

Subject(s)

COVID-19

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL